Wikipedia:Wikipedia Signpost/Single/2012-09-24
Editor's response to Roth draws Internet attention
Oliver Keyes' (User:Ironholds) defense of Wikipedia against the recent Philip Roth controversy has drawn a significant amount of attention over the last week.
The problems between Roth, a widely known and acclaimed American author, and Wikipedia arose from an open letter he penned for the American magazine The New Yorker, and were covered by the Signpost two weeks ago. Keyes—who wrote the piece as a prominent Wikipedian but is also a contractor for the Wikimedia Foundation—wrote a blog post on the topic, lamenting the factual errors in Roth's letter and criticizing the media for not investigating his claims: "[they took] Roth’s explanation as the truth and launched into a lengthy discussion of how we [Wikipedia] handle primary sourcing."
The post quickly drew large amounts of attention, in no small part due to a tweet from Jimmy Wales ("Attention journalists: worth reading."), who has over 74,000 followers on the site.
Keyes found four major problems with Roth's piece, with most being based on factual differences between the chain of events, as documented in the article's history, and Roth's account. The fourth issue was more touchy, with Keyes asserting that Roth did not use "the normal channels"—i.e. the Open-source Ticket Requests System (OTRS)—which he claimed is on the "contact us page that readers are linked to every time they open any wikipedia page ever" (emphasis in original), but the actual email address is buried in subpages.
Keyes concludes that Wikipedia's processes worked in this case. He believes that allowing article subjects to simply email Wikipedia and have us change something, just because they said so, is wrong, because verifiability cannot be compromised—or in Keyes' words, "[we try to] ensure that readers have a hope in hell of actually checking the accuracy of our information. ... We don’t want readers to trust us. We want readers to think and be able to do their own research."
He elaborated on this view in a follow-up blog post, setting up a hypothetical situation where Wikipedia has instituted an email notification system for article subjects to provide the 'real' facts for their articles. While Keyes acknowledges the potential benefits, especially when given our current policy that values verifiability over 'the truth', he quickly showed why any model of this sort is untenable: what if an article subject wanted to falsely 'correct' their article to make themselves look better?
In what may have been the strongest language used by Keyes, he finished the post by condemning the media for simply accepting Roth's claims with no investigation of their own:
[P]eople should perhaps start having a debate about the way authors are treated in "proper" sources. The New Yorker, the Guardian, ABC News and the Los Angeles Times – all respected bodies. And all, without being able and/or willing to do their own research, happily published or republished Roth’s assertions. We rely on these organisations for reporting what our politicians do, what our armed forces do, how entities with the power of life and death over humanity are accountable to the people. And they happily gulp down the glorified press releases of anyone who offers to let them touch his Pulitzer.
And you think Wikipedia is what we should be concerned about? Fuck. That. Noise.
*drops mic*
In brief
- Baltimore Sun journalist and copyeditor John E. McIntyre has published a blog post critical of Wikipedia, urging readers to "have serious misgivings about Wikipedia as a reference." His column follows a recent story by McIntyre focusing on "sham editing" as part of the Roth controversy, and is in line with opinions he has expressed in 2007 and January 2009.
Reader comments
"Rise and decline" of Wikipedia participation, new literature overviews, a look back at WikiSym 2012
"The rise and decline" of the English Wikipedia
A paper to appear in a special issue of American Behavioral Scientist (summarized in the research index) sheds new light on the English Wikipedia's declining editor growth and retention trends. The paper describes how "several changes that the Wikipedia community made to manage quality and consistency in the face of a massive growth in participation have lead to a more restrictive environment for newcomers".[1] The number of active Wikipedia editors has been declining since 2007 and research examining data up to September 2009[2] has shown that the root of the problem has been the declining retention of new editors. The authors show this decline is mainly due to a decline among desirable, good-faith newcomers, and point to three factors contributing to the increasingly "restrictive environment" they face.
First, Wikipedia is increasingly likely to reject desirable newcomers' contributions, be it in the form of reverts or deletions. Second, it is increasingly likely to greet them with impersonal messages; the authors cite a study that shows that by mid 2008 over half of new users received their first message in a depersonalized format, usually as a warning from a bot, or an editor using a semi-automated tool[3]. They show a correlation between the growing use of various depersonalized tools for dealing with newcomers, and the dropping retention of newcomers. The authors speculate that unwanted but good faithed contributions were likely handled differently in the early years of the project – unwanted changes were fixed and non-notable articles were merged. Startlingly, the authors find that a significant number of first time editors will make an inquiry about their reverted edit on the talk page of the article they were reverted on only to be ignored by the Wikipedians who reverted them. Specifically editors who use vandal-fighting tools like Huggle or Twinkle are increasingly less likely to follow the Wikipedia:Bold, revert, discuss cycle and respond to discussions about their reverts.
As a third factor, the authors note that the majority of Wikipedia rules were created before 2007 and have not changed much since, and thus new editors face the environment where they have little influence on the rules that govern their behavior, and more importantly, how others should behave toward them. The authors note that this violates Ostrom's 3rd principle for stable local common pool resource management, by effectively excluding a group that is very vulnerable to certain rules from being able to effectively influence them.
The authors recognize that automated tools and extensive rules are needed to deal with vandalism and manage a complex project, but they caution that the currently evolved customs and procedures are not sustainable for the long term. They suggest Wikipedia editors could copy the strategy of distributed, automated tools that have proven so effective at dealing with vandalism (e.g. Huggle & User:ClueBot NG) to build tools that aid in identifying and supporting desirable newcomers (a task in which Wikipedia increasingly fails[4]). Further, they recommend that the newcomers are given a voice, if indirectly via mentors, when it comes to how rules are created and applied.
Overall, the authors present a series of very compelling arguments, and the only complaint this reviewer has is that (even though three of the four were among the Wikimedia Foundation's visiting researchers for the Summer of Research 2011) they do not discuss the fact that the Foundation and the wider community has recognized similar issues, and has engaged in debates, studies, pilot programs and such aimed to remedy the issue (see for example the WMF Editor Trends Study).
Literature reviews of Wikipedia's inputs, processes, and outputs
Nicolas Jullien's "What we know about Wikipedia. A review of the literature analyzing the project(s)"[5] is an attempt at a "comprehensive" literature review of academic research on Wikipedia. Jullien works to distinguish his literature review from previous attempts like those of Okoli and collaborators (cf. earlier coverage: "A systematic review of the Wikipedia literature") and of Park which tend to split the literature into three main themes: (1) motivations of editors to contribute and relationship between motivation and contribution quality, (2) editorial processes and organization and its relationship to quality and (3) the quality and reliability of production.
Jullien builds on this basic framework by Carillo and Okoli, but distinguishes his from their work in several ways. First, Jullien holds that previous work has focused too little on the outputs, which his analysis emphasizes more. Second and crucially, Jullien's review is not limited to material published in journals and, as a result, is more representative of fields like computer science, HCI, and CSCW, which publish many of their most influential articles in conference proceedings. Jullien does not consider articles on how Wikipedia is used, questions of tools and their improvement, and studies that only use Wikipedia as a database (e.g., to test an algorithm). Other than this, the study is not limited to any particular field. It covers articles published in English, French and Spanish before December 2011, mostly based on searches in WebofScience and Scopus (sharing the search query used in the latter). The review is structured around inputs, processes, and outputs.
In terms of inputs, Jullien considers broad cultural factors in the broader environment and questions of why people choose to participate or join Wikipedia. In terms of process, he considers questions about the activities and roles of contributors, the social (e.g., network) structure of both the projects and the individuals who participants, the role of teams and organization of people within them, the processes around editing, creation, deletion, and promotion of articles with a particular focus on conflict, and questions of management and leadership. In terms of outputs, the paper divides publications into studies of process, Wikipedia user experience, the external evaluation of Wikipedia articles, and questions of Wikipedia coverage.
A second recent preprint by Taha Yasseri and János Kertész [6] likewise gives an overview of vast areas of recent research about Wikipedia. Subtitled "Sociophysical studies of Wikipedia" and citing 114 references, it compares some of the authors' own results on e.g. editing patterns (covered in several past issues of this research report, e.g.: "Dynamics of edit wars") with existing literature. The review focuses on quantitative data-driven analyses of Wikipedia production, reproduces and reports a series of previous analyses, and extends some of the earlier findings.
After a detailed description of how Wikipedia works, the authors walk through a series of types of quantitative analyses of patterns of editing to Wikipedia. They use "blocking" of edits to characterize good and "bad" editors and describe different editing patterns between these groups. The authors show that editors, in general, tend to edit in a "bursty" pattern with long periods of breaks and that editing tends to follow daily and weekly patterns that vary by culture. They also walk through several approaches for classifying edits by type, and discuss the characterization of linguistic features with an emphasis on readability.
Much of their article is focused on the issue of conflicts and edit warring. The authors pay particular attention both to the identification of conflicts and of controversial articles and topics and to characterizing the nature of edit warring itself. The paper ends with the description of an agent-based model of edit warring and conflict.
WikiSym 2012: overview report
The International Symposium on Wikis and Open Collaboration -– "WikiSym 2012" – was held August 27–29 in Linz, Austria. The three-day conference featured research papers, posters and demonstrations, and open space discussion sessions. About 80 researchers and wiki experts from around the world attended.
WikiSym is an academic conference, now in its eighth year, that seeks to highlight research on wikis and open collaboration systems. This year’s WikiSym had a strong focus on Wikipedia research, with studies that ranged from analyzing breaking news articles on Wikipedia to looking at the behavior of Wikipedia editors and how long they stay active. In all, 17 papers focused on Wikipedia or MediaWiki, and the two keynotes also focused on Wikipedia research.
The first keynote session was given by Jimmy Wales, who discussed challenges for Wikipedia and potential research questions that matter to the Wikimedia community [2][3]; Wales focused particularly on questions around diversity of the editing body, how to grow small language communities, and how to retain editors. The closing keynote was given by Brent Hecht, a researcher from Northwestern University, who spoke on techniques for making multilingual comparisons of content across Wikipedia versions, which in turn allows researchers to identify the potential cultural biases of various Wikipedia editions. Hecht found, for instance, that (looking at interwiki links across 25 languages) the majority of Wikipedia article topics only appear in 1 language; that the overlap between major language editions is relatively small; and that the depth of geographical representation varies widely by language, which a bias towards representing the country or place where that edition's language is prominent. Hecht also compared articles on the same topic across Wikipedias to see the degree of similarity between them. Hecht described his work as "hyperlingual", developing techniques to gain a broader perspective on Wikipedia by looking across language editions. His content comparison tool can be seen at the Omnipedia site, and the WikAPIdia API software he developed can be downloaded here. (See also earlier coverage about Omnipedia: "Navigating conceptual maps of Wikipedia language editions")
In addition to the presented papers, some of which are profiled below, WikiSym has a strong tradition of hosting open space sessions in parallel with the main presentations, so that attendees can discuss topics of interest. This year’s open space topics included helping new wiki users; non-text content in wikis (including videos, images, annotations, slideshows and slidecasting); the future of WikiSym; Wikipedia bots; surveying Wikipedia editors; and realtime wiki synchronization and multilingual synchronization feedback. The conference closed with a panel session entitled "What Aren't We Measuring?", where panelists discussed and debated various methods for quantifying wiki-work (by studying editors, edits, and other metrics).
This year's WikiSym was hosted at the Ars Electronica Center in Linz, a "museum of the future" that hosts the Ars Electronica festival every year. The colorful, dramatic Ars Electronica building is in the heart of Linz, so outside of sessions conference attendees enjoyed exploring and socializing in the city center. The conference dinner was held at the Pöstlingberg Schlössl, which is accessed by one of the steepest mountain trams in the world.
WikiSym 2012 papers and poster and demonstration abstracts may be downloaded from the conference website. Next year’s WikiSym is planned for Hong Kong, just before Wikimania 2013. Updates on the schedule and important dates can be found on the WikiSym blog.
On the "Ethnography Matters" blog, participant Heather Ford looked back at the conference,[7] stating that "WikiSym is dominated by big data quantitative analyses of English Wikipedia", asking "where does ethnography belong?" and counting 82% of the Wikipedia-related papers as examining the English Wikipedia and only 18% about other language Wikipedias. A panel at WikiSym 2011 had called to broaden research to other languages (see last year's coverage: "Wiki research beyond the English Wikipedia at WikiSym").
WikiSym 2012 papers
The conference papers and posters included, (apart from several ones that have been covered in earlier issues of this report):
- {{Citation needed}}: The dynamics of referencing in Wikipedia[8]: This paper contributes to the debates on Wikipedia's reliability. The authors find that density of references is correlated with the article length (the longer the article, the more references it will have per given amount of text). They also find that references attract more references (suggesting a form of a snowball mechanism at work) and that the majority of references are added in short periods of time by editors who are more experienced, and who are also adding substantial content. The authors thus conclude that referencing is primarily done by a small number of experienced editors, who prefer to work on longer articles, and who drastically raise the article's quality, by both adding more content, and by adding more references.
- Etiquette in Wikipedia: Weening [sic] New Editors into Productive Ones[9]: The authors of this paper experimented with alternative warning messages, introducing a set of shorter and more personalized warnings into those delivered by Huggle in the period of November 8 0 December 9 2011. Unfortunately, the authors are rather unclear on how exactly the Huggle tool was influenced, and whether the community was consulted on that. While in fact the community and Huggle developers have been aware of, discussed and approved of this experiment – here or here – the paper's omission to clarify that this was the case can lead to some confusion with regard to research ethics, since a casual reader may assume the researchers have hijacked Huggle without consulting the community. The wording changes were in good faith (making the messages more personalized, friendly and short), and the authors conclude that the new messages they tested proved more conducive to positively influenced new editors who received Level 1 Warnings.
- WikiTrust algorithm applied to MediaWiki programmers: A paper titled "Towards Content-driven Reputation for Collaborative Code Repositories"[10] reports on an experimental application of the well-known WikiTrust algorithm to the collaboration of programmers on a code repository, namely MediaWiki's own SVN codebase (from 2011, before it was switched to Git). In that model, contributors lose reputation when their contributions are reverted or deleted. According to the abstract, "Analysis is particularly attentive to reputation loss events and attempts to establish ground truth using commit comments and bug tracking. A proof-of-concept evaluation suggests the technique is promising (about two-thirds of reputation loss is justified) with false positives identifying areas for future refinement." An example of such false positives is "The “not now” trap: Frequently a change is reverted with a 'not now' justification, e.g., needing to hold for more testing. When that testing is done the changes are likely to be re-committed in much the same form, punishing the benign reverting editor."
- "Deletion Discussions in Wikipedia: Decision Factors and Outcomes"[11] found among other things that "69.5% of discussions and 91% of comments are well-represented by just four factors: Notability, Sources, Maintenance and Bias. The best way to avoid deletion is for readers to understand these criteria." One of the authors also co-presented a demo showing mock-ups of possible "alternative interfaces for deletion discussions in Wikipedia"[12], which would highlight the prevalence of each type of argument (e.g. notability, sourcing...) in a deletion discussion more clearly.
- "Classifying Wikipedia articles using network motif counts and ratios"[13]: Similar to an earlier paper by the same authors (earlier coverage: "Collaboration pattern analysis: Editor experience more important than 'many eyes'"), this paper examined the collaboration network of Wikipedia articles and editors using Network motifs – small graphs which occur particularly frequently as sub-graphs of networks of a certain kind, and can be regarded as its building blocks in some sense. This was then related to the quality ratings of articles: "Pages with good quality scores [e.g. featured articles] have characteristic motif profiles, but pages with good user ratings [from the [[mw:Article feedback|Article Feedback tool] don’t. This suggests that a good quality score is evidence that a collaborative curation process has been pursued. However, not all pages with high quality scores get good user ratings and some pages with low quality scores are trusted by users. Perhaps the Wikipedia quality scale is a low error scale rather than a quality scale?"
- "'Writing up rather than writing down': becoming Wikipedia Literate"[14] applied "the work of literacy practitioner and theorist Richard Darville" to communication among Wikipedians, e.g. new users and experienced users who deleted some of their contributions. "Using a series of examples drawn from interviews with new editors and qualitative studies of controversies in Wikipedia, we identify and outline several different literacy asymmetries."
- "How long do wikipedia editors keep active?"[15] found that on the English Wikipedia, "although the survival function of occasional editors roughly follows a lognormal distribution, the survival function of customary editors can be better described by a Weibull distribution (with the median lifetime of about 53 days). Furthermore, for customary editors, there are two critical phases (0–2 weeks and 8–20 weeks) when the hazard rate of becoming inactive increases".
"First Monday" on rhetoric, readability and teaching
First Monday, the veteran open access journal about Internet topics, featured three Wikipedia-themed papers in its September issue:
- AfD rhetoric examined: "The pentad of cruft: A taxonomy of rhetoric used by Wikipedia editors based on the dramatism of Kenneth Burke"[16] is an essay "describing a method for classifying arguments made by Wikipedia editors based on the theory of "dramatism", developed by the literary theorist Kenneth Burke, and demonstrating how this method can be applied to a small sample of arguments drawn from Wikipedia’s 'Article for Deletion' (AfD) process."
- "Readability of Wikipedia"[17] applied the standard Flesch Reading Ease test to the English and Simple English Wikipedias (at http://www.readabilityofwikipedia.com/ , the authors also offer the possibility to view scores directly). The effort, described as "extensive research" in an university press release found that "overall readability is poor, with 75 percent of all articles scoring below the desired readability score. The "Simple English" Wikipedia scores better, but its readability is still insufficient for its target audience." See also the detailed earlier Signpost coverage: "Readability of Simple English and English Wikipedias called into question", and the summary of an earlier paper which applied a more diverse set of readability measures to both Wikipedias: "Simple English Wikipedia is only partially simpler/controversy reduces complexity"
- "Wikis and Wikipedia as a teaching tool: Five years later"[18] by longtime Wikipedian (and contributor to this research newletter) Piotr Konieczny first gives an overview over the now widespread use of Wikipedia in the classroom and its advantages, and in a second part offers detailed practical advice drawing from the author's own "five years of experience in teaching with wikis and Wikipedia and holding workshops on the subject".
Briefly
- Recent changes visualization designed to assist admins: A paper titled "Feeling the Pulse of a Wiki: Visualization of Recent Changes in Wikipedia"[19] will be presented at the upcoming conference "VINCI 2012 : The International Symposium on Visual Information Communication and Interaction". It describes a prototype software (apparently not publicly available yet) that is designed "to aid a wiki administrator to perceive current activity in a wiki", starting out from the idea to map editors and articles in two dimensions: time and activity level. Hosted on the Toolserver, the software directly accesses a wiki's Recent Changes table, containing edits from the last 30 days. Using their tool, the authors visually discerned "six common editing patterns" on the English Wikipedia. E.g. "New article, many editors, many edits: this is the new popular article pattern which almost invariably reflects a current event". The authors also compare their tool to the previous "few and limited efforts" to visualize recent changes: WikipediaVision, Wikipulse and Wikistream.
- Unearthing the "actual" revision history of a Wikipedia article: A paper[20] by two researchers from Waseda University observes that "Unlike what is very common in software development, Wikipedia does not maintain an explicit revision control system that manages the detailed change through revisions. The chronologically-organized edit history fails to reveal the meaningful scenarios in the actual evolution process of Wiki articles, including reverts, merges, vandalism and edit wars". To extract this "actual" revision graph, where two neighboring nodes correspond to a revision and an earlier one which it was derived from, a similarity measure is needed. The article cites a 2007 paper and other research which had already proposed to understand a page's revision history as a directed tree and used similarity measures such as tf-idf. The present paper uses a similarity measure based on the frequency of n-grams (sequences of n words) and goes further in regarding the revision history as a directed acyclic graph. This allows for version merges, although the actual algorithm presented still focuses on the case of trees.
- Who deletes Wikipedia – or reverts it: Wibidata, a big data analytics startup based in San Francisco, posted a follow-up[21] to their "Who deletes Wikipedia" analysis (previous coverage), taking into account the effect of reverts, which several Wikipedians had pointed out in response to their earlier blog post.
- Geospatial characteristics of Wikipedia articles: The authors of this paper attempt to identify what makes Wikipedia articles with geographical coordinates different from others (besides their obvious relation to geographical locations).[22] They rather unsurprisingly find that more developed articles are more likely to have geo-coordinates, and consequently they find that there seems to be a correlation between article quality and having geo-coordinates links. They also find that articles with geo-coordinates are more likely to be linked to, a likely function of them being of above-average quality.
- Wikipedia's affordances: This paper, framing itself as part of the ecological psychology field, contribute to the discourse about affordances (property of an object that allows one to take a certain action).[23] The authors submit that this concept can be developed to further our understanding of how individuals perceive their socio-technical environment. The authors refine the term "technology affordances", which they define as "functional and relational properties of the user-technology system". Then use Wikipedia as their case study attempting to demonstrate its value, listing six affordances of Wikipedia (or in other words, they note that editors of Wikipedia can take the following six actions): contribution, control, management, collaboration, self-presentation, broadcasting.
- Hematologists unsure whether "to engage with Wikipedia more constructively": A letter[24] to the medical journal BMJ asks "Should clinicians edit Wikipedia to engage a wider world web?" The authors, a student and a senior lecturer in the field of haematology, "simulated 30 opportunistic internet searches for information on haemophilia in the top three search engines using term permutations: haemophilia or hemophilia (with or without A or B); carrier; information; child; treatment. Wikipedia was the most commonly found top 10 site in all search engines." In an apparent attempt to gauge the authoritativeness of Wikipedia content, "Analysis of editorial authorship of the Haemophilia Wiki [sic] for four weeks found 39 edits by 25 editors, only nine of whom had a profile, and none of whom were experts in haemophilia." Possibly unaware of Wikipedia's "no original research" policy, the authors ask "Given the evolving debate about open access to data, should publishers and authors be mandated to place reviews and key studies [...] in a public domain like Wikipedia?" (naming the example of a recent prominent paper in the field, which the Wikipedia article cites only in form of a New York Times news article about it). The letter concludes "as a professional group, we are not sure whether we wish to engage with Wikipedia more constructively". One-day access to the letter, which is around half a page long, can be purchased at £20/$30/€32 plus VAT, which may not be a very competitive price given the availability of more thorough evaluations of Wikipedia's quality elsewhere in the academic literature.
- Tracking and verifying sources on Wikipedia: Ethnographer Heather Ford published the final report from her study on how editors track and verify sources on Wikipedia.[25] The report presents an in-depth qualitative analysis of editor discussions around verifiability of information in the early editing phase of the 2011–2012 Egyptian revolution article and reviews how Wikipedia policies around primary vs secondary sources, notability and neutrality were used to make decisions about what sources to cite.
- A recommender system for infoboxes: A team of computer science researchers at the University of Texas at Arlington developed a classification method to predict infobox template types from articles lacking them, using three types of features: words in articles, categories, and named entities (or words with corresponding Wikipedia entries). The study suggests that articles with infoboxes and articles without infoboxes exhibit a substantially different distributions of the above features. The classifier was tested on data from a 2008 dump of the English Wikipedia.[26]
- Styles of information search on Wikipedia: A poster presented at the 2nd European Workshop on Human-Computer Interaction and Information Retrieval presents the results of an eye-tracking study looking at patterns of information search in Wikipedia articles. The study looks at task-specific differences in the context of factual information lookup, learning and casual reading activity.[27]
- Post-edit feedback experiment: The Wikimedia Foundation's "Editor Engagement Experiments" team reported[28] on an experiment with a simple user interface change – adding messages that confirm that an edit has been saved – and its effect on the contributions of new editors.
- Pilot study about Wikipedia's quality compared to other encyclopedias: The results of a pilot study commissioned by the Wikimedia Foundation, titled "Assessing the Accuracy and Quality of Wikipedia Entries Compared to Popular Online Alternative Encyclopaedias: A Preliminary Comparative Study Across Disciplines in English, Spanish and Arabic"[29] have been announced.
- Wikipedia, the first step toward communism: Sylvain Firer-Blaess and Christian Fuchs, in their "info-communist manifesto", argue that Wikipedia is an example of the communist mode of production and participatory democracy—"the brightest info-communist star on the Internet’s class struggle firmament". They suggest that Wikipedia's future will be a choice between co-option into the broader capitalist economy (through the exploitation of the commercial possibilities of Wikipedia's free licensing) or, alongside similar "info-communist" projects, displacing more and more capitalist production of informational goods.[30]
- Quality flaw detection competition: Maintenance templates on the English Wikipedia (e.g. "citation needed") have attracted the attention of several researchers recently, as easy to parse indicators of quality problems (example). An "Overview of the 1st International Competition on Quality Flaw Prediction in Wikipedia"[31] summarizes its outcome as follows: "three quality flaw classifiers have been developed, which employ a total of 105 features to quantify the ten most important quality flaws in the English Wikipedia. Two classifiers achieve promising performance for particular flaws. An important 'by-product' of the competition is the first corpus of flawed Wikipedia articles, the PAN Wikipedia quality flaw corpus 2012 (PAN-WQF-12)", which consists of "1 592 226 English Wikipedia articles, of which 208 228 have been tagged to contain one of ten important quality flaws". One of the two "winners", the "FlawFinder" algorithm, has been described in a paper covered last month. The competition took place on occasion of the CLEF 2012 conference, as did the first Wikipedia Vandalism Detection competition two years ago (Signpost coverage).
References
- ^ Halfaker, A., Geiger, R.S., Morgan, J. and Riedl, J. (2012), The Rise and Decline of an Open Collaboration Community, American Behavioral Scientist, forthcoming. HTML summary
- ^ http://strategy.wikimedia.org/wiki/Editor_Trends_Study
- ^ Geiger, R. S., Halfaker, A., Pinchuk, M., & Walling, S. (2012). Defense Mechanism or Socialization Tactic? Improving Wikipedia's Notifications to Rejected Contributors. ICWSM.
- ^ Musicant, D. R., Ren, Y., Johnson, J. A., & Riedl, J. (2011). Mentoring in Wikipedia: a clash of cultures. WikiSym 2011 (pp. 173–182). [1]
- ^ Jullien, N. (2012). What We Know About Wikipedia: A Review of the Literature Analyzing the Project(s). SSRN Electronic Journal. PDF
- ^ Yasseri, T., & Kertész, J. (2012). Value production in a collaborative environment. Physics and Society; Computers and Society; Data Analysis, Statistics and Probability. PDF
- ^ Ford, H. (2012) Where does ethnography belong? Thoughts on WikiSym 2012, Ethnography Matters HTML
- ^ Chen, C.-C. and Roth, C. (2012), {{Citation needed}}: The dynamics of referencing in Wikipedia, WikiSym '12 PDF
- ^ Faulkner, R., Walling, S. and Pinchuk, M. (2012), Etiquette in Wikipedia: Weening New Editors into Productive Ones, WikiSym '12 PDF
- ^ West, A.G. and Lee, I. (2012) Towards Content-driven Reputation for Collaborative Code Repositories, WikiSym '12 PDF
- ^ Schneider, J., Passant, A. and Decker, S. (2012) Deletion Discussions in Wikipedia: Decision Factors and Outcomes, WikiSym '12 PDF
- ^ Jodi Schneider, Krystian Samp: Alternative Interfaces for Deletion Discussions in Wikipedia: Some Proposals Using Decision Factors. Demo, WikiSym'12, August 27–29, 2012, Linz, Austria. ACM 978-1-4503-1605-7/12/08. PDF
- ^ Wu, G., Harrigan, M. and Cunningham, P. (2012) Classifying Wikipedia Articles Using Network Motif Counts and Ratios, WikiSym '12 PDF
- ^ http://wikisym.org/ws2012/bin/download/Main/Program/p21wikisym2012.pdf [bare URL PDF]
- ^ http://wikisym.org/ws2012/bin/download/Main/Program/p15wikisym2012.pdf [bare URL PDF]
- ^ Famiglietti, Andrew. The pentad of cruft: A taxonomy of rhetoric used by Wikipedia editors based on the dramatism of Kenneth Burke. First Monday [Online], (19 August 2012) HTML
- ^ Lucassen, Teun, Dijkstra, Roald, AND Schraagen, Jan Maarten. "Readability of Wikipedia" First Monday[Online], (20 August 2012) HTML
- ^ Konieczny, Piotr. "Wikis and Wikipedia as a teaching tool: Five years later" First Monday [Online], (25 August 2012) HTML
- ^ Robert P. Biuk-Aghai and Roy Chi Kit Chan (2012) Feeling the Pulse of a Wiki: Visualization of Recent Changes in Wikipedia, VINCI 2012, forthcoming PDF
- ^ Wu, J., & Iwaihara, M. (2012). Wikipedia Revision Graph Extraction Based on N-Gram Cover. In Z. Bao, Y. Gao, Y. Gu, L. Guo, Y. Li, J. Lu, Z. Ren, et al. (Eds.), Lecture Notes in Computer Science, 2012, Volume 7419 (Vol. 7419, pp. 29–38). Berlin, Heidelberg: Springer Berlin Heidelberg. DOI
- ^ Hougland, J. (2012) Reverting in Wikipedia, Wibidata blog HTML
- ^ Hahmann, S. and Burghardt, D. (2012), Investigation on factors that influence the (geo)spatial characteristics of Wikipedia articles PDF
- ^ Mesgari, M. and Faraj, S. (2012) Technology Affordances: The Case of Wikipedia, AMCIS 2012 PDF
- ^ Kint, M., & Hart, D. P. (2012). Should clinicians edit Wikipedia to engage a wider world web? BMJ (Clinical research ed.), 345, e4275. PDF
- ^ Ford, H. (2012) Wikipedia Sources: Managing Sources in Rapidly Evolving Global News Articles on the English Wikipedia, SSRN, August 2012. PDF
- ^ Sultana, A., Hasan, Q.M.,, Biswas, A.K., Das, S., Rahman, H., Ding, C. and Li, C. (2012), Infobox Suggestion for Wikipedia Entities, 21st ACM International Conference on Information and Knowledge Management (CIKM '12) PDF
- ^ Knäusl, H., Elsweiler, D. and Ludwig, B. (2012) Towards Detecting Wikipedia Task Contexts, 2nd European Workshop on Human-Computer Interaction and Information Retrieval, August 2012 PDF
- ^ Walling, S. and Taraborelli, D. (2012), Is this thing on? Giving new Wikipedians feedback post-edit, Wikimedia Blog HTML
- ^ Casebourne, I., Davies, C., Fernandes, M., Norman, N. (2012): Assessing the Accuracy and Quality of Wikipedia Entries Compared to Popular Online Alternative Encyclopaedias: A Preliminary Comparative Study Across Disciplines in English, Spanish and Arabic. PDF
- ^ Sylvain Firer-Blaess and Christian Fuchs (2012), Wikipedia: An Info-Communist Manifesto, Television & New Media, 12 September 2012 abstract
- ^ Maik Anderka and Benno Stein: Overview of the 1st International Competition on Quality Flaw Prediction in Wikipedia. In: Pamela Forner, Jussi Karlgren, and Christa Womser-Hacker (Eds.): CLEF 2012 Evaluation Labs and Workshop – Working Notes Papers, 17–20 September, Rome, Italy. ISBN 978-88-904810-3-1. ISSN 2038-4963. 2012. PDF
Reader comments
01010010 01101111 01100010 01101111 01110100 01101001 01100011 01110011
This week, we tinkered with WikiProject Robotics. From the project's inception in December 2007, it has served as Wikipedia's hub for building and improving articles about robots and robotics, accumulating two Featured Articles and seven Good Articles along the way. The project covers both fictitious and real-life robots, the technology that powers them, and many of the brains behind the robotics field. Between improving articles listed in the extensive index of robotics and adding new content to the more stimulating Robotics Portal, there is no shortage of tasks the WikiProject could use help with. We interviewed Chaosdruid and N2e.
What motivated you to join WikiProject Robotics? Do you work on mechanical devices in real life?
- Chaosdruid: I have been interested in robotics since I was a youngster. I had a wide variety of interests, from computing through environmental issues to space exploration, but robotics seems to crop up in all of these areas. I suppose Isaac Asimov was the first real robotics influence in my teens, as I am sure he was for many, when I was reading SF novels. Him and Ray Bradbury led me down a path of realisation that robots were probably one of the things that would greatly affect humanity's future.
- Unfortunately I do not work on them in real life, I am more computer-oriented in my work and personal life. Robotics would have been something I would have loved to have studied, but during my college and University years, in the late 70s/early 80s, the UK was suffering a general lack of interest in funding robotics due to the Lighthill debate and the ensuing lack of funding for AI technology, such as robotics and computing, referred to as the AI winter. I later discovered that there was a small group in Scotland at the University of Edinburgh that managed to progress quite far and kicked myself for not having discovered them when I was due to start Uni.
- N2e: I have been a long-time member of WikiProject Spaceflight, working to improve spaceflight-related articles, and have become more interested in robotic mechanisms on spacecraft as my work has become more involved in developing robotic spacecraft mechanisms for both government and commercial interests. So I am a more recent member of WikiProject Robotics. The overlap between human-initiated spaceflight and robotic-mechanisms, and especially the new commercial spaceflight capabilities and markets that are being, and will be, enabled by this technology is an intersection that should be delved into, and explored, for the benefit of improving Wikipedia articles on both topics. And, yes, I work with mechanical devices in real life: developing and prototyping new and novel robotic mechanisms for use in low-Earth orbit and cislunar space.
The project is home to 2 Featured Articles and 7 Good Articles. Have you contributed to any of these? What are the greatest challenges of preparing a robotics article for FA or GA status?
- Chaosdruid: I have done some work on the Blade Runner page, including a copyedit, but I was mainly tidying up and adding a few refs during an FA review, as it was already an FA when I started getting more involved with Wiki a few years back. Hell is Other Robots I have not been involved with, though both are perhaps seemingly not robotics related, Bender and Blade runner are iconic robotic fiction.
- I have done most of my work in the lesser articles, expanding stubs and creating a few new pages. I have looked at every Robotics project article at least once since I started with the project, but we had a major year long task to assess the unassessed articles first, marking any that need work, and we are now starting to concentrate on expanding articles and getting them improved and restructuring our project categories. Hopefully there will be more GAs soon as we expand and improve the C and B class articles and try to get them promoted to GA and FA.
How much overlap is there between WikiProject Robots and the various projects covering electronics, engineering, and software? Have there been any collaborations between these projects? What are some easy ways that members of related projects can improve robotics articles as part of the work they already do for their own project?
- Chaosdruid: There is a great deal of overlap. Most of the talk pages with a Robotics Project banner also feature other banners - computing, space, technology, and electronics being the main ones. There were collaborations in the past, something I would love to increase in the near future. Many people already improving articles include the robotics related material as they do so. Members of the project who are not-so-active nowadays, and past members, are still robotics fans and do a good job of keeping things up to date - though every little helps and it would be great if people could help by expanding robotics related material if they come across it.
Are some topics in robotics better covered than others on Wikipedia? Are any types or generations of robots neglected? What needs to be done to fill gaps in Wikipedia's coverage of robotics?
- Chaosdruid: I suppose, like most projects, those topics covered by current media will always be more covered than those that are not - especially current events such as the MSL mission reaching Mars and the successful deployment of Curiosity.
- N2e: Yes. As Chaosdruid notes, when a robotic topic hits the major worldwide news circuit, that topic will tend to be covered much better in Wikipedia. In the example s/he cites (Curiosity rover), as the Mars Science Laboratory spacecraft neared Mars in late July of this year, after an 8-month journey, it became clear that the article would become both much-viewed and rapidly-edited following a successful landing on Mars. Furthermore, there had been a couple of failed attempts to rename the article to "Curiosity rover" during the preceding months while the rover payload was in-transit on a spacecraft. So a Talkpage discussion on the MSL Talkpage ensued, and then developed consensus (a couple of weeks prior to landing), that if the landing was successful, the article would be split into two articles: 1) keep the scope of the MSL article restricted to the spacecraft and spaceflight mission, and 2) take the robotic surface science mission and robotic surface instrumentation/robot-arm/robot-rover content over to a new article entitled Curiosity rover, knowing it would grow rapidly immediately after a successful landing. Editors worked collaboratively to draft the new (yet-to-be-split) article, and made it live within ten minutes of the SUCCESSful landing on 6 Aug 2012. In the event, the Curiosity rover article quickly became one of the most viewed articles on Wikipedia for a day or two after the landing.
How frequently does the project deal with notability issues regarding robotics? At what point do robots developed as university projects, amateur inventions, or commercial products warrant an article?
- Chaosdruid: Notability is a problem, especially with newer robotics topics. Many educational and government-funded projects are funded or run for three years or less, and many of those are not generally referred to in mainstream press and media. This leads to some of the ground breaking robotics projects not meeting notability, though quite often the related academics are already on Wiki. Many robots and commercial products also fall into this notability hole and, though they may exist without press for a while, they more often than not meet notability once they have been established for some time.
The project members have done a good job keeping the project's talk page clean. What are the benefits and dangers of rapidly archiving old discussions? What are some frequent questions asked by people new to the project?
- Chaosdruid: We are pretty good at answering questions, when asked, and at dealing with issues when they arise. Rapidly archiving keeps the page clean and draws attention to matters needing dealing with rather than cluttering it up with older issues that have already been dealt with. We get little repetition and older topics are rarely reposted.
- I think we get less posts than others as robotics is seen as a by-product of topics rather than being the main topic. For example take the latest Mars Rover, Curiosity. Though it is automated and is robotics related, the space project would probably get more posts than we have had.
What are the project's most urgent needs? How can without specialized knowledge in robotics help today?
- Chaosdruid: Members, active members - anyone who is at all interested in helping the project are welcome. We do have some support from a few academics who are prepared to give expert knowledge on articles, as well as join the assessment team when we move to our B and above assessments but, as academics are often time-restricted, any help in preparing us for our improvement drive, starting in November, would be greatly appreciated.
Anything else you'd like to add?
- Chaosdruid: We would love new members who can get involved with the up-coming project to go through all the articles and try and reassess them and expand them - there are a couple of hundred that are already tagged for work needed through our talk-page project banner (Category:Robotics articles needing attention).
- I would especially like to ask any previous members of the project who are less involved nowadays to reacquaint themselves with us, as things have changed over the past two years.
Next week's interviewees were shaken, but not stirred after their interrogation. Until then, don't let us catch you spying in the archives.
Reader comments
UK chapter rocked by Gibraltar scandal
In the second controversy to engulf Wikimedia UK in two months, its immediate past chair Roger Bamkin has resigned from the board of the chapter. The resignation last Wednesday followed a growing furore over the conflict of interest between two of Roger's roles outside the chapter and his close involvement in Wikimedia UK board's decision-making process, including the access to private mailing lists that board members in all chapters need. But the irony surrounding Roger's resignation is its connection with efforts by Wikimedians and collaborators to strengthen the reach of Wikimedia projects through technical innovation.The first potential conflict involves a contract between Roger Bamkin's company Victuallers Ltd and the government of the UK territory of Gibraltar, through the Gibraltar Tourist Board. The contract is to provide the enabling technology and the associated training of local participants for GibraltarpediA, a project launched just two months ago by the Gibraltar government after it signed a trademark agreement with the WMF. The slogan for the project is "Bridging Europe and Africa". A second COI issue concerns the use of the English Wikipedia's DYK process to gain front page exposure for a number of articles related to Gibraltar, including 17 in August.
- Wikipedia towns
What are quick-response (QR) codes? Central to the two Wikipedia town projects is QRpedia, a mobile Web-based system for using QR codes to deliver Wikipedia articles to visitors – often tourists – in their preferred language. Specialised plaques, each containing a unique code, are installed at locations of interest; when a visitor holds their smartphone in front of the plaque, this triggers instant access to a Wikipedia article about the location. QRpedia was conceived by Roger Bamkin and coded by Terence Eden last year, and is also in use at institutions including museums in the UK, the US, and Spain.
According to Gibraltarpedia.org (which redirects to an English Wikipedia page), the more recent project "aims to cover every single notable place, person, artefact, plant and animals [sic] in Gibraltar in as many languages as possible", and will be "at least three times the size of MonmouthpediA".
However, QR tourism isn't all plain sailing. The BBC's technology news site says:
[O]nce all the landmarks are equipped with codes and all the articles are written, other factors need to be dealt with for the project to take off. Roaming charges may deter visitors from connecting to the web – and the government of Gibraltar says it is considering the possibility of free wi-fi. Also, tourists should be familiar with QR codes and be willing to use them. Although people may be used to seeing them, not many in the Western world actually scan them.
On the upside, the article says that QR technology "will be integrated into Apple's Passbook ticket/coupon wallet service, available on the forthcoming iOS6 operating system".
- DYK
However, things started to unravel with a post at the DYK talk page on 14 September regarding multiple nominations for coveted main-page exposure through that forum, in which Roger promoted an article he himself wrote. This is contrary to DYK rules, although Roger has pointed out that he rescinded the nomination. Apparently 17 Gibraltar-related nominations were pushed through during August, and it appears that the DYK procedures have been used in a way that minimises the review process and maximises the promotion to the main page of articles on this topic – chiefly by cross-nomination and cross-reviewing. This comes after a succession of disputes during the past few years about the practice by some editors of launching large numbers of nominations on the same topic-areas at DYK.
Roger told the Signpost:
John Cummings and I are not being paid to edit wikipedia. We are being paid to organise the project, enabling and helping individuals within different communities to join together, the global, virtual world of Wikipedia editors, and the people of Gibraltar and the surrounding regions in Europe and Africa. My motivation is to inspire people and organisations to acquire and distribute knowledge freely throughout the world.
I did make a mistake in creating articles on DYK – two articles this month that included a Gibraltar cave that tourists cannot go into and a WWII destroyer named after an old name for Gibraltar. I did this out of enthusiasm and interest in a new subject. I have volunteered to not edit DYK on Gibraltar related subjects.
- Jimmy's talk page
The Signpost notes that this statement was made despite the fact that paid editing is currently permissible on the English Wikipedia.
Discussion on Jimmy's talk page has since grown in size and vitriol ("whore", "witch hunt"). There have been accusations that Roger "may be slanting information in a fairly subtle way in some Gibraltar-related article[s]", and a proposal that he "suggest edits rather than making them himself on any topic related to Gibraltar". One editor wrote that "while it is possible that what Roger is doing may be legal in the most narrow of senses, it is totally unethical: it is clear that he should step down NOW from any position of trust or responsibility in any Wikimedia operation, AND should cease to edit any article where he is operating as a paid agent of the subject ...". Roger also received strong support from some editors ("One fact that I am certain of is that Roger is an honourable man, and I would expect him to be perfectly capable of giving paid advice to Gibraltar without taking on any of the editing obligations that you seem to imagine").
- Gibraltan government statements
The situation, by now a swirling quandary concerning the relationship between WMUK, the Gibraltar government, Victuallers Ltd (and Roger's associates), and his role as a Wikipedia editor, has not been helped by statements by Gibraltar's minister for tourism, Neil Costa, as reported in the Gibraltar newspaper Vox. At the same time as encouraging Gibraltans to open an account on Wikipedia to contribute "photos and information on the sites, history and so on", Costa apparently said, "We will have millions of people onto the GibraltarpediA once the product has spiralled. ... So one of the great decisions the Tourist Board has is effectively marketing but done at the lowest possible cost, and this is exactly what this achieves in a very revolutionised way. ... GibraltarpediA will encourage tourists to come to Gibraltar without having to do so through a package tour."
To make matters worse, Gibraltar's Director of Heritage, Professor Clive Finlayson, is reported in the Gibraltar Chronicle as noting that concern was expressed that volunteers who do not have Gibraltar's best interest at heart may write untrue or negative articles. (The continued British claim to ownership over the territory has been the subject of friction with the Spanish authorities for decades.) Finlayson said, "The people from Wikipedia UK have guaranteed to us that this has an element of self-regulation and we want to encourage many local volunteers to keep an eye on what is going on, and if things go on that is nasty, then it is very easy for them to go back to the earlier page in seconds."
- WMUK board and conflict of interest
On the same day, Chris Keating, Chair of WMUK, put out a statement on the matter, saying among other things that:
Wikimedia UK's sole involvement with [GibraltarpediA] to date has been the despatch of a few booklets. ... An agreement between Roger and Terence on the one hand and Wikimedia UK on the other is in the works, shouldn't take more than a few weeks to finish off, and will provide a firm basis for the growing use of Wikipedia-linked QR codes in future. ... Our conflict of interest policy is available here and is supported by the Declarations of Interest register here. The Conflict of Interest policy is modelled quite closely on Charity Commission guidance and is very clear that [if board members] have a conflict of interest ... they have to recuse themselves. We have followed this policy in all discussions related to the subjects mentioned in this thread. ... There is some debate on the Board about whether we need to develop this policy further, and members' views are welcome. [Links piped by the Signpost]
Roger declares his paid consultancies for both Monmouthshire county council and the Gibraltar government; this includes a statement that "there is no known COI as WMUK does not have a relationship with this Government but it is hoped that one may develop." A press release by the board last Friday states:
Roger has always been open with Wikimedia UK about his commercial interests and has declared them in public at appropriate times. He has not voted in any Wikimedia UK decisions about Monmouthpedia since the start of his consultancy relationship with MCC or on any decisions about Gibraltarpedia or QRpedia. ... Roger has not received any Wikimedia UK funds for any of these projects, except for out-of-pocket expenses incurred in his role as a volunteer in the early development stages of Monmouthpedia before becoming a consultant, paid in line with our normal expenses policy.
However, a member of WMUK has told the Signpost he believes the board is naive about conflict of interest, and that all chapters and the foundation need to learn lessons from this scenario. It is not good enough, he said, to disclose potential conflicts and to have COI policies if people in leadership positions don't understand COI.
This appears to be confirmed by the fact that by 30 June, Roger had already offered his resignation to the board twice, clearly perceiving that there might be a COI in his emerging extra roles. A single diff, then, is evidence that the problem is systemic, and at least partly exonerates Roger from responsibility for COI – at least in relation to his continued board membership. Further, he stated on 19 September on Jimmy's page: "When I stood for the board last time I clearly made the point that I would have COI issues but I wouldn't have undeclared COI issues."
That the problem might be systemic resonates with recently blogged complaints by ex-WMUK treasurer Thomas Dalton that for WMUK "too much happens without proper thought and oversight, which has resulted in serious mistakes being made"; and that the chapter needs to give itself "the time to think about where we are and where we are going otherwise everything will spiral out of control".
The controversy has already received coverage in the press and online, including stories by the notorious FoxNews ("Jimmy Wales 'disgusted' as trustee accused of editing for profit"), PCWorld ("Wikipedia contributors debate whether it's okay to pay for posts"), and "Corruption in Wikiland? Paid PR scandal erupts at Wikipedia" and "Wikipedia honcho caught in scandal quits, defends paid edits" by CNET tech writer Violet Blue, among dozens of other outlets that together represent significant publicity value for Roger Bamkins's IT consultancy.- QR and the Foundation's trademark
The Signpost asked Geoff Brigham, the foundation's chief counsel, whether the foundation has any formal relationship with the Gibraltar Tourist Board:
The Wikimedia Foundation signed a trademark agreement with the government of Gibraltar, as represented by the Gibraltar Tourist Board, for a limited term use (one year) of the Wikipedia trademarks as part of the Gibraltarpedia project. As with most trademark agreements, the Foundation protects its marks by a detailed license which among other things, requires compliance with any reasonable requests of the Foundation, as well as with the Foundation’s Trademark Policy. This ensures that use of the marks upholds the reputation of the Foundation and limits confusion as to affiliation, and enables the Foundation to end relationships where there has been a material breach of the agreement or where use of the mark is out of line with the Foundation's mission.
We understand that QR plaques are being used in the UK, the US, India, Germany, Spain, Russia, Serbia, Estonia, Australia, and Hungary. Usage appears to be encouraged by a how-to page, complete with a gallery of examples that include the WMF trademark. Nowhere on that page or WikiProject QRpedia is there mention of the need to obtain trademark agreements from the WMF to use the Wikimedia trademarks on QRpedia installations.
The Signpost asked Geoff Brigham whether the foundation has a legal agreement concerning all uses of its logo on the plaques that are enabling components of the QRPedia technology:
There are no legal agreements in place between the Wikimedia Foundation and QRPedia. We would encourage anybody using Wikimedia trademarks for plaques to contact us so we can review and hopefully give approval in appropriate cases that advance our mission.
We have had several email exchanges with Roger, who pointed out the enormous advantages to the movement that are likely to flow from the innovations for which he and his colleagues have largely been responsible.
In brief
- Wikimedia India board elected: The Indian chapter has announced the result of their recent board election, with five people elected out of nine candidates: Karthik Nadar, Nikita Belavate, Pranav Curumsey, Srikanth Ramakrishnan, and Viswa Prabha.
- English Wikipedia:
- Arbitration Committee: Three clarification and amendment requests are open. The oldest is an amendment request for the Sathya Sai Baba 2 case, while two new clarification requests involve Date delinking and Palestine-Israel. There is also a new request for arbitration regarding psychotherapies, which seems likely to be accepted if discussion breaks down again.
- Main-page redesign competition: 24 proposals have been lodged, and discussion about the issue continues on the competition talk page. Editors are welcome to submit their own proposals until 30 September.
- WMF RfC: A RfC on whether to establish a legal fees assistance program for volunteer role-specific risks that go beyond the contributor defense policy already in place is in open for comments.
- Milestones: Wikimedia Commons, the file hosting site for Wikimedia Foundation projects, has reached 14,000,000 files, just 109 days after reaching 13,000,000.
Reader comments
Signpost investigation: code review times
Code review figures mixed, improving
Late last month, the "Technology report" included a story using code review backlog figures – the only code review figures then available – to construct a rough narrative about the average experience of code contributors. This week, we hope to go one better, by looking directly at code review wait times, and, in particular, median code review times
To this end, the Signpost independently analysed data from the first 23,900 changesets as they stood on September 17, incorporating some 66,500 reviews across 32,100 patchsets. From this base, changes targeted at branches other than the default "master" branch were discarded, as were changesets submitted and reviews performed by bots. Self-reviews were also discarded, but reviews made by a different user in the form of a superseding patch were retained. Finally, users were categorised by hand according to whether they would be best regarded as staff or volunteers.[nb 1] Although this week's article focuses mainly on so-called "core" MediaWiki code, future issues will probe extension-related statistics.
WMF bosses will, on the whole, be pleased with the final figures. 50% of revisions to core MediaWiki code submitted during August was reviewed for the first time in just 3 hours 30 minutes, with 25% being reviewed in 20 minutes and 75% within 27 hours. These figures were similar across both first patchsets and later amendments, and similar with regard to slight changes in what qualified as "a review".[nb 2] The relevant trend over time is considered in the following tables. On the left is review time across all patchsets submitted to core; with the right hand table, just the first patchset in any given changeset is included.
Month | 25% | Median | 75% | Current mean |
---|---|---|---|---|
May | 42 minutes | 4 hours and 25 minutes | 1 day, 11 hours and 27 minutes | 3 days, 3 hours and 38 minutes |
June | 47 minutes | 19 hours and 10 minutes | 3 days, 16 hours and 45 minutes | 5 days, 8 hours and 29 minutes |
July[nb 3] | 39–40 minutes | 7 hours and 4–8 minutes | 2 days, 5–9 hours | 2 days, 16 hours and 38 minutes |
August[nb 3] | 20–21 minutes | 3 hours and 11–29 minutes | 21–24 hours | 1 day, 11 hours and 52 minutes |
Month | 25% | Median | 75% | Current mean |
---|---|---|---|---|
May | 38 minutes | 3 hours and 27 minutes | 1 day, 5 hours and 4 minutes | 2 days, 1 hour and 58 minutes |
June | 45 minutes | 12 hours and 34 minutes | 2 days, 13 hours and 31 minutes | 3 days, 7 hours and 39 minutes |
July | 22 minutes | 3 hours and 16 minutes | 1 day, 7 hours and 21 minutes | 1 day, 17 hours and 18 minutes |
August | 19 minutes | 3 hours and 33 minutes | 19 hours and 50 minutes | 1 day, 1 hour and 26 minutes |
The data show, then, that there has been a marked improvement in getting followup patchsets reviewed quicker, while review times for "first attempt" patchsets have improved less dramatically. Other analyses are more concerning. For example, a volunteer-written patchset waits, on average (either median or mean) twice as long as a staff-written one for its first review, although the gap has closed from three times as long in June and July. Staff provide 86% of first reviews for core, with just five staff members collectively accounting for some 55% of the total.[nb 4] Moreover, even in August, more than 5% of patchsets targeted at core waited a week for their first review.
As with all large datasets, it is difficult to rule out subtle methodological issues and in any case unideal to pinpoint trends over as short a period as four months. The full data set is available upon request.
Notes
- ^ One notable side effect of the methodology employed was the exclusion from the final analysis of patches that were abandoned or amended without review, even if the user had intended for them to be so reviewed and/or the amendments were minimal. Future analyses may wish to refine their methodology to take this into account.
- ^ Specifically, the in/exclusion of reviews not assigning numerical scores and the in/exclusion of force abandonments and reversions.
- ^ a b Ranges indicate the possible impact of patchsets still awaiting a review.
- ^ The equivalent figures for core plus WMF-deployed extensions are 95% and 43% respectively.
In brief
Not all fixes may have gone live to WMF sites at the time of writing; some may not be scheduled to go live for several weeks.
- HTML5 problems and tolerating invalidity: After initial reports last week of problems with cell alignment, this week brought fresh reports of HTML5 problems centred on Wikipedia "pushpin" maps. The maps, which show given co-ordinates overlaid on a background map, are highly sensitive to single-figure changes in display position, which seems to be the source of the present difficulties. All broken templates are expected to be fixed by hand, although it is unclear how easily or reliably this small degree of brokenness can be identified. In related news, a consensus appeared to form on the wikitech-l mailing list in support of outputting "invalid" (but still functioning) HTML5 in preference to subject it to an automated but error-prone automatic conversion.
- Search data published, retracted: The publication of "anonymous search log files for Wikipedia and its sister projects" was halted and reversed on Wednesday after the anonymised data was found to contain a limited amount of personal information (Wikimedia blog). Email addresses, credit card numbers and social security numbers had previously been removed from the dataset, which was intended to (among other things) "provide valuable feedback to our editor community, who can use it to detect topics of interest that are currently insufficiently covered". The team behind the release does not yet know when the same data might be released in a more fully anonymised form; there was also no indication in the blog post of how many copies are thought to be circulating.
- French Wikipedia told: do worry about performance: The recent creation of a series of templates designed to map area codes to values on the French Wikipedia caused a significant detrimental effect on overall performance, reports Lead Performance Architect Tim Starling. The templates, which provoked the first direct WMF intervention in template content in recent times, used a switch statement that included nearly 40,000 items, all of which had to be loaded into memory on each template invocation. Using Lua, due for release in January, would partially ameliorate the situation, it was reported. "At some point in the future, you may be able to put this kind of geographical data in Wikidata. Please, template authors, wait patiently", wrote Starling, while Wikidata Project Director Denny Vrandečić added that he was "at the same time delighted ... by the capability and creativity of the Wikipedia community to solve such tasks with MediaWiki template syntax and ... horrified by the necessity of the solution taken".
- Your preference over preferences?: The thorny issue of rethinking the options available at Special:Preferences was raised again this week with the creation of an RfC looking at the issue (wikitech-l mailing list). Even slight amendments to certain preferences have previously caused controversy; for example, the removal of preferences tends to provoke anger among those users reliant upon them, while the renaming or repositioning of preferences tends to cause confusion. The work links in with moves to create "global" gadgets, which could then replace static preferences such as alterations to link colours.
- Results in from first "post-edit feedback" trial: The results from the first trial of post-edit notifications (example pictured) have now been published on the Wikimedia blog. The trial, undertaken by the Foundation's "Editor Engagement Experiments" (E3) team, trialled displaying the messages to new editors after every save, finding an overall increase in quantity of edits and no change in editor quality. The same team is now testing messages only after "milestone" edit counts have been reached, to compare the effect on enticing new editors to edit more. In related news, the Foundation's "page curation" project saw its first stable release this week, with users given the green light to use Special:NewPagesFeed and its toolbar on a day-to-day basis.
Reader comments
Dead as...
Featured articles
Fourteen featured articles were promoted this week:
- Eraserhead (nom) by Grapple X. Eraserhead is a 1977 surrealist horror film written and directed by American filmmaker David Lynch. The black-and-white film was Lynch's first feature film and was produced with the support of the American Film Institute. It tells the story of Henry Spencer, who is left to care for his deformed child in a desolate industrial landscape. The film, shot over a period of several years, was initially ignored but gained popularity through long runs as a midnight movie.
- M-553 (Michigan highway) (nom) by Imzadi1979. M-553 is a north–south state trunkline highway in the Upper Peninsula of the US state of Michigan. It connects M-35 near Gwinn with the Marquette Bypass, ultimately linking Marquette to Sawyer International Airport. The road, originally called County Road 553, dates back to the 1930s and was transferred to the jurisdiction of the Michigan Department of Transportation (MDOT) in 1998, preceding a change in designation.
- Dodo (nom) by FunkMonk. The Dodo (Raphus cucullatus) is an extinct flightless bird that was endemic to the island of Mauritius. It was about one metre (3.3 ft) tall and may have weighed 10–18 kg (22–40 lb) in the wild, although its exact appearance remains a mystery. The bird was first recorded in 1598 and quickly fell prey to sailors and introduced animals. The last credible recorded sighting of a Dodo was in 1662, and its extinction called attention to the problem of human involvement in the disappearance of entire species.
- Phallus indusiatus (nom) by Sasata. Phallus indusiatus is a fungus in the family Phallaceae, or stinkhorns. It is found in tropical areas on several continents, where it grows in woodlands and gardens in rich soil and well-rotted woody material. The fruit body is characterised by its conical to bell-shaped cap on a stalk and hanging "skirt". Mature fruit bodies are up to 25 cm (9.8 in) tall and the cap is covered in a greenish-brown spore-containing slime. The mushroom is found in the cuisine and mythology of several countries.
- Ian Fleming (nom) by Schrodinger's cat and Cassianto. Fleming (1908–1964) was an English author, journalist and Naval Intelligence Officer, best known for his 14 James Bond spy novels, which drew on his experience in intelligence operations. The book series sold well and spawned 24 films, the second-highest-grossing film series of all time. Fleming's other works include a children's story and two works of non-fiction.
- Andjar Asmara (nom) by Crisco 1492. Andjar (1902–1961) was an Indonesian dramatist and film director. He started his career as a journalist, first in Batavia then in Padang, where he became active in theatre. By the 1930s he had joined a touring troupe as their writer, penning several stage plays for them. He began his career as a director in 1940, making six films in ten years. Andjar spent the remainder of his life as a film and theatre critic, although he continued to write screenplays.
- Sadie Harris (nom) by TRLIJC19 and Sofffie7. Harris is a character on the American television medical drama Grey's Anatomy, created by Shonda Rhimes and portrayed by Melissa George. In the show, Harris serves as a medical intern and old friend of the series' protagonist Meredith Grey. Cast after appearing in In Treatment, George was contracted for several episodes. Critics have described her as "naughty", "mischievous", and "nutty".
- HMS Furious (47) (nom) by Sturmvogel 66. Furious was a modified Courageous-class battlecruiser built for the Royal Navy during the First World War. During construction, the ship was very lightly armoured and armed with only a few heavy guns. It was later converted to an aircraft carrier and used for test runs and, later, hunting for German raiders. Its aircraft also served to protect ground troops and harass targets in and around Eruope. The ship was decommissioned in 1945.
- System Shock 2 (nom) by Hahc21. System Shock 2 is a 1999 first-person action role-playing video game for Microsoft Windows designed by Ken Levine and co-developed by Irrational Games and Looking Glass Studios. The game follows a lone soldier trying to stem the outbreak of a genetic infection that has devastated a spaceship and, although initially meant as a stand-alone title, was later brought into the System Shock universe. It has been considered highly influential, although it underperformed in the market.
- Microsoft Security Essentials (nom) by Codename Lisa. Microsoft Security Essentials is an antivirus software product that provides protection against malware on several Windows platforms. Building on other Microsoft security products, the software was criticised by antivirus companies as deficient and a possible violation of competition law. Released in 2009, the product is (as of 2012) one of the most popular antivirus products.
- The Hunger Games (nom) by Evanh2008. The Hunger Games is a 2008 young adult novel by American writer Suzanne Collins that follows a young female contestant in a televised fight-to-the-death among 24 children. The book, based in part on Greek mythology and reality television, was generally well received and earned several prizes. It has been followed by two sequels and a 2012 film adaptation.
- Augustinian theodicy (nom) by ItsZippy. The Augustinian theodicy is a philosophy defending the Christian premise of an all-powerful and perfectly loving God despite evidence of evil in the world. It was first developed by Augustine of Hippo, who rejected the idea that evil exists in itself, instead regarding it as a corruption of goodness. Numerous thinkers have responded to Augustine's theodicy, both to build on it and criticise it.
- Madagascar (nom) by Lemurbaby. Madagascar is an island country in the Indian Ocean off the southeastern coast of Africa. It is a biodiversity hotspot; over 90 percent of its wildlife is found nowhere else on Earth. The nation comprises the island of Madagascar as well as numerous peripheral islands and is estimated to have just over 22 million people (as of 2012), most of whom live on less than two dollars a day.
- Olga Constantinovna of Russia (nom) by DrKiernan. Olga Constantinovna (1851–1926) was the queen consort of King George I of Greece and, briefly in 1920, queen regent of Greece. A member of the Romanov dynasty, she married George I at the age of sixteen and soon became involved in charitable work. She was exiled from Greece several times, but her descendents served as kings of Greece.
Featured lists
Six featured lists were promoted this week:
- Grade I listed churches in Merseyside (nom) by Peter I. Vardy. Merseyside, a metropolitan county in North West England, is home to fourteen Grade I listed churches. The Grade I listed churches in Merseyside mostly date from the Medieval period to the 19th and 20th centuries.
- Bookseller/Diagram Prize for Oddest Title of the Year (nom) by GOP. The Bookseller/Diagram Prize for Oddest Title of the Year is a humorous British literary award that is given annually to the book with the oddest title. The prize has been awarded every year but two since 1978 by The Bookseller.
- List of Arrested Development episodes (nom) by Wikipedical. The American television sitcom Arrested Development comprised 53 episodes during its 2003–2006 run. The show centers on the Bluth family, a formerly wealthy, habitually dysfunctional family, and is presented in a continuous format.
- List of The X-Files episodes (nom) by TBrandley. The American science fiction television series The X-Files saw 202 episodes during its 1993–2002 run. The show initially centered on FBI special agents Fox Mulder and Dana Scully, who work on cases linked to the paranormal.
- List of international cricket five-wicket hauls at Brabourne Stadium (nom) by AroundTheGlobe. Brabourne Stadium, a cricket ground in Mumbai, India, has seen 14 five-wicket hauls (where a bowler takes five or more wickets in a single innings), nearly all in Test Cricket, since 1949.
- Wiz Khalifa discography (nom) by Sufur222. The American rapper Wiz Khalifa has released 4 albums, 10 mixtapes, 29 singles, and 24 music videos. His most successful thus far has been Rolling Papers, which was certified gold in the US.
Featured pictures
Five featured pictures were promoted this week:
- Dendrobates azureus (nom; related article), created by Quartl and nominated by Tomer T. Dendrobates azureus is a species of poison dart frog found in Suriname and Brazil. Named for its azure skin, it may lose its toxicity in captivity.
- Israel–Egypt barrier (nom; related article), created by Idobi1 and nominated by Tomer T. The Israel–Egypt barrier is being built by Israel along the country's border with Egypt. Initially meant to curb the influx of illegal immigrants, its security is being upgraded.
- John Henry Newman (nom; related article), created by John Everett Millais and nominated by Spongie555. Newman (1801–1890) was an English religious figure. First an Anglican and leader in the Oxford Movement, he converted to Catholicism and rose to the rank of cardinal.
- Kallima inachus (nom; related article), created by Quartl and nominated by Tomer T. Kallima inachus is a nymphalid butterfly found in tropical Asia from India to Japan. With its wings closed, it closely resembles a dry leaf with dark veins.
- Synthetic bismuth crystal (nom; related article), created by Alchemist-hp and nominated Mediran. Crystals of the element bismuth can be formed artificially in a lab or at home, with varying quality. These crystals are very colourful and have a "hoppered" shape.
Reader comments
Image filter; HotCat; Syntax highlighting; and more
Current discussions on the English Wikipedia include:
Proposals
- Adding broad topic box to main page
- A new section on the main page has been suggested to encourage editing of more topics. This box would contain broad topics that may persuade a reader to become an editor. Some argue that this will not serve the purpose that it was intended to serve.
- Personal Image Filter
- An image filter that lets users toggle images on and off is being proposed. So far users have supported the idea but have yet to agree upon a way of achieving it.
- Main page redesign
- A competition has begun to find a better alternative to the current main page, which was last redesigned in March 2006.
- Enable HotCat for all
- HotCat has been suggested to be turned on for all users to aid in the process of categorizing Wikipedia. Arguments against the proposal include the lack of double check feature and no way for administrators to remove it from someone that abuses that feature.
- Simplify the edit window
- Due to upcoming changes, the editing interface is being revamped, your input is welcomed in deciding in how to edit and move elements on the screen.
Requests for comment
- Turning on the Memento Extension
- The Memento extension makes it easier to access the historical version of articles.
- Wording for Requested Moves policy
- Clarification is requested for wording in regards to the Requested Moves policy and Article Titles.
- Categorization of persons
- Should biographies be further categorized to include genetic and cultural heritage, faith or sexual orientation?
- Syntax highlighting
- A discussion of different forms of highlighting has been started to find out what should be highlighted and if it should be enabled by default.
- City population templates
- Do the city population templates add value to articles and should they include images?
Reader comments